Skip to content

[Fix][E2E-MongoDB CDC] Stabilize concurrent submission test and ensure CI trigger#10474

Open
yzeng1618 wants to merge 4 commits intoapache:devfrom
yzeng1618:dev-mongdb-ci
Open

[Fix][E2E-MongoDB CDC] Stabilize concurrent submission test and ensure CI trigger#10474
yzeng1618 wants to merge 4 commits intoapache:devfrom
yzeng1618:dev-mongdb-ci

Conversation

@yzeng1618
Copy link
Collaborator

@yzeng1618 yzeng1618 commented Feb 9, 2026

Purpose of this pull request

This PR fixes a flaky MongoDB CDC E2E case and ensures the related CI job is triggered when this module changes.

企业微信截图_17706094909783

Root cause: The cleanSourceTable() function executes Mongo deleteMany and MySQL TRUNCATE simultaneously. When the job is running continuously, the sink is emptied and then backfilled to 2/5 by replay/upsert within the stream, resulting in a persistent inconsistency where the source is empty but the sink is not.

Does this PR introduce any user-facing change?

No. This PR only changed this test to keep the same goal, but remove a flaky step.

What changed:

  • Keep the first cleanSourceTable() at test start.
  • Add a quick check after async submit: if either task already failed, fail immediately.
  • In the second phase, replace cleanSourceTable() with appendIncrementalSourceTableData().

Why:

  • cleanSourceTable() clears both source and sink while streaming jobs are still running. That can create race conditions and cause unstable expected 0 / actual N timeouts.
  • Appending new CDC records is deterministic and still proves both concurrent tasks keep running correctly without cache interference.

How was this patch tested?

Check list

@yzeng1618 yzeng1618 requested review from Carl-Zhou-CN and zhangshenghang and removed request for zhangshenghang February 9, 2026 10:26
@DanielCarter-stack
Copy link

Issue 1: Missing method documentation comments

Location: seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-mongodb-e2e/src/test/java/mongodb/MongodbCDCIT.java:670

private void cleanMongoSourceTable() {
    mongodbContainer.executeCommandFileInDatabase("inventoryClean", MONGODB_DATABASE);
}

Related context:

  • Caller: testMongodbCdcMultiTaskConcurrentSubmission (line 296)
  • Caller: cleanSourceTable() (line 675)
  • Related method: cleanSourceTable() (lines 674-678)

Problem description:
The newly added cleanMongoSourceTable() method lacks JavaDoc comments. Although the code intent is relatively clear, missing documentation causes the following problems for other maintainers:

  1. Unclear why a separate "MongoDB-only clear" method is needed
  2. Unclear when to use cleanMongoSourceTable() vs cleanSourceTable() in different scenarios
  3. Difficult to understand the background and root cause of this fix

Potential risks:

  • Risk 1: Other developers may misuse these two methods, using cleanSourceTable() in concurrent tests causing tests to become unstable again
  • Risk 2: Code reviewers may not understand the necessity of this modification, considering it unnecessary refactoring

Impact scope:

  • Direct impact: Maintainability of MongodbCDCIT class
  • Indirect impact: Future modifications and extensions to MongoDB CDC tests
  • Affected area: Single E2E test module

Severity: MINOR

Improvement suggestions:

/**
 * Cleans only the MongoDB source collections without touching the MySQL sink tables.
 * This is specifically used in concurrent CDC tests to avoid race conditions where:
 * <ul>
 *   <li>TRUNCATE on MySQL happens immediately</li>
 *   <li>CDC stream continues processing and may replay/upsert data to the emptied table</li>
 *   <li>Result: source is empty but sink is partially filled (e.g., 2/5 records)</li>
 * </ul>
 * 
 * <p>By only cleaning MongoDB source, the CDC stream naturally stops producing new events,
 * allowing the test to verify final consistency without interference from replay operations.
 * 
 * <p>Use {@link #cleanSourceTable()} when you need to clean both source and sink.
 * 
 * @see #cleanSourceTable()
 * @see #testMongodbCdcMultiTaskConcurrentSubmission()
 */
private void cleanMongoSourceTable() {
    mongodbContainer.executeCommandFileInDatabase("inventoryClean", MONGODB_DATABASE);
}

Rationale:

  • Explains why this method is needed (avoid race conditions)
  • Describes CDC's replay/upsert mechanism
  • Clarifies usage scenarios
  • Provides cross-references to related methods
  • Helps future maintainers understand the background of this fix

Issue 2: CI trigger conditions may be too broad

Location: .github/workflows/backend.yml:765

all-connectors-it-1:
  needs: [ changes, sanity-check ]
  if: needs.changes.outputs.api == 'true' || needs.changes.outputs.engine == 'true' || contains(needs.changes.outputs.it-modules, 'connector-cdc-mongodb-e2e')

Related context:

  • Modified job: all-connectors-it-1 (lines 763-792)
  • Related jobs: all-connectors-it-2 to all-connectors-it-7 (lines 794-978)
  • Change detection logic: changes job (lines 120-297)

Problem description:
The PR adds special handling for the connector-cdc-mongodb-e2e module in all-connectors-it-1, but this presents several problems:

  1. Inconsistency: Other MongoDB CDC-related test jobs (such as oracle-cdc-connector-it, kudu-connector-it, etc.) use similar patterns, but this modification only targets all-connectors-it-1

  2. Incomplete coverage: If the connector-cdc-mongodb-e2e module is modified, all related connector test jobs should be triggered, not just all-connectors-it-1

  3. Naming confusion: The naming of all-connectors-it-1 indicates it should run part of all connector tests, but this condition makes it also run when MongoDB CDC changes, which may not align with naming semantics

Potential risks:

  • Risk 1: When only connector-cdc-mongodb-e2e is modified, not all necessary tests may be triggered
  • Risk 2: all-connectors-it-1 may run unnecessary tests, wasting CI resources
  • Risk 3: This pattern may need similar modifications in other places, but may be overlooked

Impact scope:

  • Direct impact: GitHub Actions CI/CD workflow
  • Indirect impact: Completeness of PR checks and CI runtime
  • Affected area: CI/CD infrastructure

Severity: MINOR

Improvement suggestions:
Consider one of the following two approaches:

Option A: Create dedicated MongoDB CDC E2E job

mongodb-cdc-connector-it:
  needs: [ changes, sanity-check ]
  if: needs.changes.outputs.api == 'true' || needs.changes.outputs.engine == 'true' || contains(needs.changes.outputs.it-modules, 'connector-cdc-mongodb-e2e')
  runs-on: ${{ matrix.os }}
  strategy:
    matrix:
      java: [ '8', '11' ]
      os: [ 'ubuntu-latest' ]
  timeout-minutes: 180
  steps:
    # ... Steps similar to all-connectors-it, but only runs connector-cdc-mongodb-e2e

Option B: Add conditions in multiple all-connectors-it jobs

# Add the same condition to all-connectors-it-1, 2, ..., 7
if: needs.changes.outputs.api == 'true' || needs.changes.outputs.engine == 'true' || contains(needs.changes.outputs.it-modules, 'connector-cdc-mongodb-e2e')

Rationale:

  • Option A is clearer, with a dedicated job responsible for MongoDB CDC testing
  • Option B ensures all connector test jobs run when MongoDB CDC changes
  • Current modification may not be complete enough, requires CI/CD maintainer review

Note: This requires deep understanding of the project's CI/CD architecture and strategy. Recommend discussing with CI/CD maintainers.


Issue 3: Fix may be incomplete - other test methods may also be affected

Location: seatunnel-e2e/seatunnel-connector-v2-e2e/connector-cdc-mongodb-e2e/src/test/java/mongodb/MongodbCDCIT.java:183, 200, 208, 226

@TestTemplate
public void testMongodbCdcToMysqlCheckDataE2e(TestContainer container)
        throws InterruptedException {
    cleanSourceTable();  // Line 183
    // ...
    cleanSourceTable();  // Line 200 - Called after running CDC task
    TimeUnit.SECONDS.sleep(20);
    assertionsSourceAndSink(MONGODB_COLLECTION_1, SINK_SQL_PRODUCTS);
}

@TestTemplate
public void testMongodbCdcMultiTableToMysqlE2e(TestContainer container)
        throws InterruptedException {
    cleanSourceTable();  // Line 208
    // ...
    cleanSourceTable();  // Line 226 - Called after running CDC task
    TimeUnit.SECONDS.sleep(20);
    assertionsSourceAndSink(MONGODB_COLLECTION_1, SINK_SQL_PRODUCTS);
    assertionsSourceAndSink(MONGODB_COLLECTION_2, SINK_SQL_ORDERS);
}

Related context:

  • testMongodbCdcMultiTaskConcurrentSubmission() (lines 255-300) - Fixed
  • testMongodbCdcToMysqlCheckDataE2e() (lines 180-203) - Not fixed
  • testMongodbCdcMultiTableToMysqlE2e() (lines 205-252) - Not fixed
  • testSavepointRecovery() (lines 346-380) - Not fixed
  • testResumeTokenFailureRecovery() (lines 388-438) - Not fixed

Problem description:
The PR only fixed the issue in testMongodbCdcMultiTaskConcurrentSubmission(), but similar patterns exist in other test methods:

  1. testMongodbCdcToMysqlCheckDataE2e() (line 200):

    • Calls cleanSourceTable() after CDC task runs
    • Then waits 20 seconds and verifies data consistency
    • This scenario is similar to concurrent testing and may also be affected by the same race condition
  2. testMongodbCdcMultiTableToMysqlE2e() (line 226):

    • Calls cleanSourceTable() after CDC task runs
    • Verifies data consistency of two collections
    • Similarly may be affected by race conditions

Potential risks:

  • Risk 1: These test methods may also become unstable due to the same race condition
  • Risk 2: If these tests start failing, maintainers may not understand the root cause
  • Risk 3: Incomplete fix causes the problem to reappear in other places

Impact scope:

  • Direct impact: Stability of other MongoDB CDC E2E tests
  • Indirect impact: CI/CD reliability
  • Affected area: Multiple E2E test methods

Severity: MAJOR

Improvement suggestions:

Option A: Uniformly fix all similar scenarios

@TestTemplate
public void testMongodbCdcToMysqlCheckDataE2e(TestContainer container)
        throws InterruptedException {
    cleanSourceTable();
    CompletableFuture.supplyAsync(
            () -> {
                try {
                    container.executeJob("/mongodbcdc_to_mysql.conf");
                } catch (Exception e) {
                    log.error("Commit task exception :" + e.getMessage());
                    throw new RuntimeException();
                }
                return null;
            });
    TimeUnit.SECONDS.sleep(10);
    // insert update delete
    upsertDeleteSourceTable();
    TimeUnit.SECONDS.sleep(20);
    assertionsSourceAndSink(MONGODB_COLLECTION_1, SINK_SQL_PRODUCTS);

    // Fix: Only clear MongoDB after running CDC task
    cleanMongoSourceTable();  // Change to cleanMongoSourceTable()
    TimeUnit.SECONDS.sleep(20);
    assertionsSourceAndSink(MONGODB_COLLECTION_1, SINK_SQL_PRODUCTS);
}

Option B: Analyze each test scenario, selectively fix
Need to analyze the expected behavior of each test:

  • If the test expects to clear source and target then verify, keep cleanSourceTable()
  • If the test expects to clear source and let CDC stop naturally, change to cleanMongoSourceTable()

Option C: Add comments explaining why not modifying
If certain tests truly need to clear both source and target, comments should be added explaining why:

// This test intentionally cleans both source and sink to verify CDC can recover
// from a complete reset scenario.
cleanSourceTable();

Rationale:

  • Ensures all test methods follow a consistent pattern
  • Avoids future instability issues
  • Improves test suite reliability
  • Helps maintainers understand why cleanSourceTable() is used in some places while cleanMongoSourceTable() is used in others

Suggested actions:

  1. Review all locations using cleanSourceTable()
  2. Analyze the expected behavior of each scenario
  3. Uniformly fix the pattern or add clear comments
  4. Run the complete test suite to verify modifications

@github-actions github-actions bot removed the CI&CD label Feb 10, 2026
@yzeng1618
Copy link
Collaborator Author

Modify backend.yaml to enable the CI pipeline to cover the E2E tests for all-connectors-it-1. The CI pipeline now runs successfully as expected.

https://github.com/yzeng1618/seatunnel-dev/actions/runs/21851023643
image

Copy link
Collaborator

@chl-wxp chl-wxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@davidzollo davidzollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
good job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants